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ABSTRACT: This article provides a thorough overview of a wide range of 
advanced statistical methods that have found extensive and resilient 
applications in the intricate field of spatial modeling for variables in a 
geographical information system (GIS) platform. The noteworthy triumph 
of these approaches can be due to a convergence of speed, dependability, 
precision, and an inherent eco-consciousness that coexist to reshape the 
scenario of environmental data analysis. The utilization of these models has 
outshined conventional methods in the present terrain of scientific 
investigation and environmental analysis, becoming an authentication of 
innovative research and decision-making procedures. These approaches 
demonstrate commendable data utilization efficiency by effectively 
accepting reduced sample sizes. This not only saves resources but also 
aligns with the ethical imperative of minimizing environmental effects 
wherever possible. Furthermore, the combination of these statistical 
techniques with GIS has paved the way that greatly expands their utility. 
This tool helps to discover deep spatial linkages, extrapolate trends, and 
findings into actionable insights that are relatable across all disciplines. 
These approaches encompass not only predictive modeling but also the 
realms of error assessment and efficiency evaluation. In conclusion, the 
adoption of these statistical methods is quite useful in facilitating sound 
decision-making environmental studies. Some of the domains include soil 
properties, air quality parameters, vegetation distribution, land cover and 
land use, water quality parameters, temperature and climate variables, 
natural hazards, urban infrastructure planning, ecological habitats, noise 
pollution levels, and radiation and exposure assessment. As the trajectory of 
scientific growth unfolds, these techniques will serve in directing 
researchers, practitioners, and policymakers to a future where empirical 
accuracy and environmental consciousness meet synergistically. 
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Information System) environment. GIS 
(Openshaw and Clarke, 2019; Wang and Liu, 
2023) is fundamentally a framework for 


1. Introduction 


Geostatistics, as a specialized branch of 
statistics, focuses on analyzing, modeling, 
and interpreting spatial data, providing 
valuable insights into spatial relationships and 


gathering, organizing, analyzing, and 
types of geographic 


information. It enables users to grasp detailed 


visualizing diverse 


the variability of natural phenomena across 
different locations in a GIS (Geographical 
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relationships, discover patterns, and uncover 
trends within a geographic environment by 
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seamlessly integrating data such as maps, 
satellite imagery, topography data, and 
attribute data (on-ground analysis data). It is a 
powerful visualization tool, that translates 
raw data into understandable maps and 
graphics by utilizing the knowledge of 
geography. Initially developed for estimating 
ore reserves in the mining industry, 
geostatistics has now found extensive 
application in diverse fields such as geology 
(Xu and Zhang, 2023), environmental science 
(Ghute et al., 2023), agriculture (Mathenge et 
al., 2022), hydrology (Demarquet et al., 2023), 
and many more. The process of constructing 
and analyzing mathematical or statistical 
representations of spatial relationships, trends, 
and variations within a geographic area is 
referred to as spatial modeling. It entails 
using data to construct models that capture 
the spatial distribution of phenomena such as 
environmental factors throughout a specific 
geographical location. These models seek to 
elucidate the underlying patterns, 
relationships and influences that govern the 
distribution of these occurrences. In the 
domain of soil science (Khallouf et al., 2020; 
Criado et al., 2021), geostatistics plays a 
crucial role in understanding the spatial 
variability of soil parameters. Soil, being a 
complex and heterogeneous medium, exhibits 
significant variations over short distances. 
The geostatistical analysis aids in 
characterizing spatial variability 
(AbdelRahman et al., 2020), creating spatial 
models (Zakeri and Mariethoz, 2021), and 
making reliable predictions (Kingsley et al., 
2019) of soil properties at unsampled 
locations. Modern geostatistical tools and 
techniques, such as semivariograms, spatial 
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auto-correlogram, and various interpolation 
approaches, are employed to assess the spatial 
variability (G6kmen et al., 2023; Khan et al., 
2021) of soil properties. 


In contrast, classical statistical techniques 
typically rely on descriptive statistical tools 
like mean, median, mode, coefficient of 
variation, etc., to measure soil property 
variability without considering its spatial 
dependence on the sampling point. However, 
they fail to adequately explain the continuous 
spatial variability pattern. Key tools of 
geostatistics (Gangopadhyay and Reddy, 
2022) include kriging 
interpolation, spatial uncertainty, and cross- 
validation. The variogram is a fundamental 


variogram, 


concept that quantifies the spatial correlation 
structure in the data by measuring the average 
difference in values between pairs of data 
points as a function of their separation 
distance or lag. It helps determine the range 
of spatial influence, identify trends, and select 
appropriate interpolation methods. Variogram 
models are commonly employed to describe 
the spatial correlation in the dataset. The 
interpretation of variograms (Fischer, 2019) 
involves three components: Sill, Range, and 
Nugget effect. The Sill represents the plateau 
or "sill" at large lag distances, signifying the 
maximum spatial variability. This plateau 
indicates the range of influence beyond which 
data points are not significantly correlated. 
The Range is the distance at which the 
variogram levels off, indicating the spatial 
correlation range of the soil parameter, with 
data points within this range showing a strong 
correlation. Lastly, the Nugget effect 
represents the abrupt change in the variogram 
at a lag distance of zero. 
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Figure 1: Methods to map spatial variability in parameters by using geostatistical techniques in 
a GIS platform. 
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Figure 2: Software to study the spatial datasets. 
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The variogram plays a crucial role in 
geostatistics as it accounts for measurement 
errors, microscale variability, or other factors 
causing spatial variation at very small 
distances. It is visualized through a graph that 
depicts the pattern of semivariance change 
with varying distances between two sampling 
points. 


Semivariance is calculated by measuring 
the dispersion of all observation points from a 
mean or specific value derived from the 
dataset. It serves to assess spatial continuity 
or spatial autocorrelation as a function of 
distance. When the sampling interval between 
two locations is smaller than the range 
distance, the variable is considered spatially 


autocorrelated. Consequently, the spatial 
variability assessment of that variable 
becomes significant for its proper 


management. The N:S ratio provides insight 
into the degree of spatial dependence of a soil 
parameter. Different N:S value ranges, such 
as <0.25, 0.25-0.75, and >0.75, indicate 
strong, moderate, and weak spatial variability 
of a particular soil parameter, respectively. 
Estimating variogram parameters (sill, range, 
and nugget effect) involves fitting various 
theoretical models to the experimental 
variogram. The choice of the model depends 
on the data and spatial characteristics of the 
soil parameter being analyzed. Commonly 
used variogram models (Molla et al., 2023) 
include the spherical, exponential, Gaussian, 
and power models. The spherical model is a 
simple model with a sharp cutoff at the range, 
resembling a _ sphere. Conversely, the 
exponential model is a smoother and 
continuous model that gradually approaches 
the sill. These models aid in capturing the 
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spatial correlation structure and are 
fundamental for accurate predictions and 
spatial analysis (Mondal et al., 2021) in 
geostatistics. The Gaussian model shares 
similarities with the exponential model but 
exhibits a more gradual increase in spatial 


correlation. On the other hand, the Power 
model is specifically useful for data 
displaying power-law _ behavior, often 


employed for variograms with heavy-tailed 
distributions. 
1.1 Error estimation 

To identify the most suitable model for a 
particular soil property, the selection process 
involves minimizing the error and 
maximizing the model's efficiency known as 
the error calculation or cross-validation. The 
correctness of the spatial model is checked 
with the error percentages. The "error 
percentage" provides a quantitative 
representation of the difference between 
predicted and observed values at unsampled 
locations. The predicted values include the 
model-generated value at a point, whereas the 
observed values are the recorded values at the 
location. It is also known as prediction error 
or estimation error, and it is used to assess the 
effectiveness of the chosen interpolation 
strategy. This broadly incorporates mean 
absolute error (MAE), root mean square Error 
(RMSE), and mean squared prediction error 
(MSPE). The "Mean Absolute Error (MAE)" 
calculates the average of absolute differences 
between predicted and observed values, 
providing a measure of usual error magnitude 
while ignoring directional differences. The 
root mean square error (RMSE)" on the other 
hand, encompasses both error magnitude and 
direction, expressing the square root of the 
average squared difference between the two 


82 


Journal of soil, plant and Environment 


sets of values. Meanwhile, the "mean squared 
prediction error (MSPE)" focuses on the 
squared differences between __ them, 
emphasizing bigger errors by squaring. These 
error percentage measurements provide an 
idea of the efficiency and precision of their 
interpolation procedures. A smaller error 
percentage indicates improved prediction 
accuracy, whereas a higher error percentage 
indicates a less accurate prediction. Various 
cross-validation techniques, such as leave- 
one-out cross-validation or k-fold cross- 
validation, are employed to 
validate geostatistical models (Rajalakshimi 
et al., 2023). In addition to the Gaussian, 
exponential, and power models, geostatistics 
offers various other methods and techniques 
to enhance spatial analysis and prediction. 
These methods aim to handle diverse data 
structures and _ characteristics, providing 
detailed insights into spatial variability 
(Nagaraj et al., 2023) and the correlation of 
soil properties. The whole analysis of datasets 
is broadly divided into data exploration tools, 
deterministic method, geostatistical method 
and interpolation with barrier method which 
are described below: 

2. Data exploration tools 


commonly 


These are the tools used to explore or 
understand the dataset in detail. On the basis 
of their utility and properties they can be 
further subdivided as histogram, normal Q-Q 
plot, voronoi trend 
semivariogram, general Q-Q plot and cross 
variance cloud. 

2.1 Histogram 


maps, analysis, 


A histogram is a graphical (Reza et al., 
2016; Xu and Zhang, 2023) representation of 
a dataset's distribution, offering a visual 
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means to comprehend the 
frequency or probability distribution of 
numerical data. The dataset is divided into 


underlying 


intervals or bins, and the height of each bar in 
the histogram corresponds to the frequency or 
count of observations falling within that bin. 
Histograms prove invaluable in identifying 
patterns, understanding central tendencies and 
data spread, detecting outliers, and 
visualizing the overall shape of the 
distribution. As such, they are commonly 
employed in data analysis and exploratory 
data analysis (EDA) processes. 
2.2 A normal quantile-quantile (Q-Q) plot 
A quantile-quantile plot, often abbreviated 
as a normal Q-Q plot or simply a Q-Q plot, is 
a graphical tool used to assess (Othmani et al., 
2023; Wang and Liu, 2023) whether a dataset 
adheres to a normal distribution. It is 
achieved by comparing the quantiles of the 
dataset against the quantiles of a theoretical 
normal distribution. When the points on the 
Q-Q plot closely align along a straight line, it 
indicates that the data is approximately 
normally distributed. Conversely, deviations 
from the straight line in a specific pattern 
suggest the presence of skewness or heavy- 
tailed characteristics in the data. If the points 
on the plot exhibit a clear curvature or an "S" 
shape, it indicates significant non-normality. 
Q-Q plots are valuable for detecting 
departures from normality and are commonly 
employed in statistics, particularly during 
EDA. They offer visual insights into the data 
distribution and can aid in_ selecting 
appropriate statistical techniques or deciding 
on data transformations if normality 
assumptions are necessary for a particular 
analysis. In summary, Q-Q plots provide a 
powerful tool for evaluating the conformity 
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Figure 3: From ground to final output (Source: Singh and Sarma, 2023). 


of a dataset to a normal distribution and play 
a vital role in statistical analyses. 
2.3 Voronoi maps 

Voronoi diagrams, also known as Voronoi 
tessellations, are spatial partitions of a given 
area into regions based on the distance to a 
set of points called "seeds" or "sites." Each 
region in a Voronoi map represents the area 
closest to a particular seed compared to any 
other seed in the set, and these regions are 
referred to as Voronoi cells or polygons. 
Voronoi maps visually depict spatial 
relationships, illustrating how the study area 
is divided based on proximity to the seed 
points. Voronoi maps serve as a powerful tool 
for understanding spatial (Lu et al., 2022) 
relationships and find extensive applications 
in various fields. They efficiently partition 
space based on distance and are widely used 
in geography, cartography, spatial analysis, 
computer graphics, animation, art and design, 
and many other domains. The versatility of 
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Voronoi diagrams makes them invaluable for 
analyzing spatial data and _ visualizing 
proximity-based patterns in a given area. 
2.4 Trend analysis 

Trend analysis (Mousavi et al., 2023) is a 
statistical technique that involves examining 
the pattern of data over time to identify 
consistent upward or downward movements, 
or other patterns, in the data series. This 
method is widely used in diverse fields, such 
as economics, finance, marketing, and 
environmental science (Bangroo et al., 2023), 
to gain insights into the historical behavior of 
a variable and make predictions about its 
future behavior. To quantify the trend in a 
data set, linear regression or exponential 
growth/decay models are often utilized. 
These models help in understanding the 
direction and magnitude of the trend. Trend 
analysis serves as a valuable tool to compare 
present and past trends for a specific variable, 
allowing for a better understanding of its 
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Figure 4: Spatial Maps of Temperature and Precipitation generated by using Kriging 
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evolution over time. By analyzing historical 
data trends, decision-makers can make 
informed predictions and take appropriate 
actions to respond to changing conditions and 
plan for the future effectively. Trend analysis 
enables the identification of important 
patterns and can provide valuable insights 
into the underlying factors influencing a 
particular variable's behavior. 
2.5 Semivariogram 

The semivariogram, also referred to as a 
variogram or semivariance function, is a 
fundamental tool used in geostatistics 
(Dongare et al., 2022; Fischer, 2019) to 
analyze spatial variability and quantify spatial 
autocorrelation within a _ dataset. This 
statistical measure illustrates how data points 
vary concerning their spatial separation or lag 
distance. Essentially, the 
reveals how the similarity of data values 
changes with distance. To calculate the 


semivariogram 


semivariogram, one employs semivariance, 
which is half the average squared difference 
between data points within a given lag 
distance. By doing so, it quantifies the level 
of similarity or dissimilarity between data 
points at a specific distance apart. When the 
lag distance is small, the semivariance tends 
to be low since nearby points exhibit higher 
similarity. However, as the lag distance 
increases, the semivariance may increase up 
to a certain point, representing the spatial 
autocorrelation range or "nugget." Beyond the 
nugget, the semivariance may reach a plateau, 
indicating that the spatial dependence has 
reached its maximum. 


The shape of the semivariogram assists 
Statisticians in identifying the appropriate 
spatial model for interpolation or prediction. 
Common models used to fit the 
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semivariogram include the exponential, 
spherical, and Gaussian models. The 
estimation of the semivariogram can also be 
visualized through a covariance cloud 
(Openshaw and Clark, 2019), which 
represents the covariance between two 
variables. Each point in the cloud corresponds 
to a pair of data points, and its position on the 
graph reflects their joint covariance. This 
representation provides further insights into 
spatial relationships and helps’ in 
understanding the spatial structure of the 
dataset. 
2.6 A general quantile-quantile (Q-Q) plot 
A general quantile-quantile (Q-Q) plot is a 
graphical tool used to evaluate whether a 
dataset adheres to a particular probability 
distribution. Unlike the normal Q-Q plot, 
which specifically checks for normal 
distribution, the general Q-Q plot can be 
employed to assess the fit of data to various 
theoretical distributions. In a general Q-Q 
plot, if the dataset follows the target 
distribution, the points on the plot will 
approximately align along a straight line. 
Deviations from this straight line indicate a 
difference from the specified distribution. If 
the points closely follow the straight line, it 
suggests that the data is well-described by the 
chosen theoretical distribution. On the other 
hand, if the points deviate from the line in a 
systematic pattern, it indicates that the data 
differs from the target distribution. General 
Q-Q plots are invaluable tools in statistical 
analysis, as they provide a visual means to 
assess the goodness of fit between data and 
different theoretical distributions. They are 
particularly useful when determining the most 
appropriate distribution for modeling the data 
or when testing assumptions in statistical 
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methods that rely on specific distributions. By 
employing general Q-Q plots, researchers can 
gain insights into the suitability of various 
distributions for representing their data and 
make informed decisions about the choice of 
statistical models (Lu et al., 2022; Nagaraj et 
al., 2023) and assumptions. 
2.7 Cross Variance 

Cross variance (Othmani et al., 2023, 
Fischer, 2019) refers to the covariance 
between two variables in a multivariate 
setting. It is a measure of how two variables 
vary together, capturing the degree of 
correlation or relationship between them. The 
cross variance is commonly used in statistical 
analysis to understand the association 
between two variables and to assess their 
joint behavior. To calculate cross variance, 
one can observe the cross-variance cloud, 
which is a visual representation of the 
covariance or correlation between the two 
variables. This is achieved by plotting the 
data points of both variables on a scatter plot, 
with one variable on the x-axis and the other 
on the y-axis. The resulting cloud of points 
provides insights into the strength and 
direction of their relationship. A_ straight 
linear-shaped cloud indicates a_ strong 
positive or negative correlation between the 
variables, while a scattered or elliptical cloud 
suggests a weaker or no correlation. By 
analyzing (Reza at al., 2016) the cross- 
variance cloud, researchers can quickly assess 
the level of association between the two 
variables and make informed decisions about 
their relationship in the dataset. 
3. The deterministic way 

A deterministic method is an algorithm 
(Molla et al., 2023; Zakeri and Mariethoz, 
2021) or approach that consistently generates 
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the same output for a given input, regardless 
of the number of times it is executed. It 
operates without any randomness or 
uncertainty, resulting in a completely 
predictable and consistent outcome. This 
quality makes deterministic methods highly 
valuable in fields such as computer science, 
mathematics, physics, and engineering, where 
repeatability, reliability, and precision are 
essential. Their key characteristics include 
repeatability, predictability, the elimination of 
uncertainty, and overall consistency. By 
offering reproducibility and __ stability, 
deterministic methods ensure reliable and 
accurate results applications, 
ranging from simulations to critical decision- 
making processes. The deterministic method 
includes four subdivisions: 

3.1 Inverse distance weighting (IDW) 


in various 


Inverse distance weighting (IDW) is a 
widely used interpolation technique in spatial 
analysis and geostatistics for estimating 
values at unsampled locations based on 
nearby sampled data points. The fundamental 
assumption of IDW (AbdelRahman et al., 
2020, Openshaw and Clark, 2019) is that 
values at unsampled locations are influenced 
more by the values of nearby points than 
those farther away. To achieve this, the 
method employs a power parameter "p" that 
controls the influence of nearby points on the 
estimation. Typically, "p" is set between | 
and 3, with lower values giving more weight 
to points closer to the target location and 
higher values providing more equal weight to 
all points. IDW is favored for its simplicity 
and intuitive nature, making it 
straightforward to implement. However, it 
does have some limitations. One such 
limitation is its sensitivity to the choice of the 
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power parameter, which can affect the 
interpolation results significantly. 
Additionally, IDW tends to produce "bull's- 
eye" artifacts around data points, particularly 
when the data is sparse or unevenly 
distributed. As a result, IDW is often utilized 
for basic interpolation tasks and serves as a 
baseline for more sophisticated interpolation 
methods in GIS and spatial data analysis. 
These advanced techniques (Molla et al., 
2023) takes into account additional factors, 
such as spatial autocorrelation, spatial trends, 
and variogram models, to achieve more 
accurate and robust interpolation results for 
complex spatial datasets. Despite its 
limitations, IDW remains a valuable tool in 
geospatial analysis, providing a quick and 
straightforward solution for certain 
interpolation needs. 
3.2 Global polynomial interpolation (GPI) 
Global polynomial interpolation (GPI) is 
an interpolation technique utilized to estimate 
values between known data points by fitting a 
polynomial function to the entire dataset. 
Unlike local interpolation methods, such as 
inverse distance weighting, GPI considers the 
entire dataset to create a single polynomial 
function that smoothly fits all the given data 
points. The objective of global polynomial 
interpolation is to find a polynomial function 
that accurately passes through all the 
provided data points, allowing for the 
approximation of values at any point within 
the dataset's range. While global polynomial 
interpolation offers advantages, such as 
producing a smooth global approximation of 
the entire dataset, it may not be suitable for 
datasets (Zakeri and Mariethoz, 2021) with a 
high degree of noise or outliers. High-degree 
polynomials can lead to oscillations and 
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overfitting, where the interpolation function 
becomes overly sensitive to individual data 
points. The complexity and computational 
intensity of the interpolation process increase 
with the degree of the polynomial used. 
Therefore, selecting an —_ appropriate 
polynomial degree becomes crucial in 
balancing the need to capture the data's 
essential behavior while avoiding overfitting. 
In practice, when more flexibility and 
robustness are required in spatial data 
interpolation, other methods such as spline 
interpolation or kriging are 
employed. These techniques provide more 
adaptive and smoother interpolations, making 
them suitable for datasets with noise or 
outliers. By considering the — specific 
characteristics of the dataset, researchers can 
choose the most appropriate interpolation 
method to achieve an accurate and reliable 
estimation of values between data points. 

3.3 Radial basis function (RBF) 

Radial Basis Function (RBF) is a versatile 
mathematical function widely utilized for 
interpolation, approximation, and smoothing 
of data, especially in scenarios involving 
scattered data points in multidimensional 
space. The core concept behind RBF (Singh 
and Sarma, 2023; Othmani et al., 2023) 
involves approximating a complex function 
by a combination of simple functions known 


commonly 


as basis functions, which exhibit decaying 
behavior with distance from the center point. 
The popularity of radial basis functions lies in 
their flexibility and adaptability to complex 
and irregular data patterns. These functions 
offer smooth and continuous interpolation, 
even in high-dimensional spaces, making 
them well-suited for various applications. 
RBF interpolation finds extensive use in data 
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smoothing, image processing, 
graphics, and numerical solutions of partial 
differential equations. By employing RBF, 


computer 


researchers can effectively handle problems 
with scattered data and achieve accurate 
estimations (Molla et al, 2023) and 
approximations in multidimensional space. 
The ability of radial basis functions to capture 
intricate data relationships and _ provide 
seamless interpolation makes them a valuable 
tool in data analysis and 
computational fields. 

3.4 Local polynomial interpolation (LPI) 


various 


Local polynomial interpolation (LPI) is an 
interpolation method (Fischer, 2019) 
designed to estimate values between known 
data points by fitting a polynomial function to 
a small subset of nearby data points. Unlike 
global polynomial interpolation (GPI), which 
considers the entire dataset to create a single 
polynomial function, LPI adapts the 
interpolation model for each target point 
based on its neighboring data points. In LPI, 
the key idea is to construct a polynomial 
function that better approximates the data 
around each target point by using a weighted 
average of nearby data points. The 
polynomial is usually of a low degree, such as 
linear or quadratic, to ensure smoothness and 
prevent overfitting. Each data point in the 
neighborhood of the target point is assigned a 
weight based on its distance from the target 
point. A kernel function is typically employed 
for this weighting, giving more weight to 
points closer to the target and less weight to 
points farther away. For each target point, a 
polynomial function, such as linear or 
quadratic, is fitted to the weighted subset of 
nearby data points using weighted least 
squares or other regression techniques. This 
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approach (Wang and Liu, 2023) allows LPI to 
adapt to the changing data behavior more 
effectively, making it particularly useful for 
datasets with spatial or — temporal 
heterogeneity, where the underlying data 
pattern varies across different regions or 
periods. Local polynomial interpolation finds 
widespread application in spatial data 
analysis, geostatistics, and time _ series 
analysis, where capturing local variations is 
crucial for predictions and 
interpolation. However, the choice of the 


accurate 


bandwidth parameter (defining the size of the 
neighborhood) and the degree of the local 
polynomial can significantly impact the 
quality of the interpolation. Hence, careful 
selection of these parameters is essential to 
achieving reliable and precise interpolation 
results. 
4. The geostatistical way 

Geostatistical methods (Othmani et al., 
2023; Rajalakshimi et al., 2023; Lu et al., 
2022; Gangopadhyay and Reddy, 2022) 
comprise a set of statistical techniques 
specifically designed to analyze and model 
spatially correlated data. These methods hold 
significant value for applications in geology, 
environmental science, mining, agriculture, 
and other fields where data is collected across 
different geographic locations. Geostatistics 
takes into account the spatial dependence or 
autocorrelation that may exist in the data, 
enabling more accurate predictions and 
interpolation of values at unsampled locations. 
The primary objectives of geostatistical 
methods are to create spatial maps, identify 
spatial patterns, estimate values at unsampled 
locations, and quantify uncertainty in 


predictions. By considering the spatial 
relationships between data points, 
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geostatistical approaches deliver robust and 
reliable results for decision-making in various 
fields that heavily rely on spatial data analysis 
and prediction. Key geostatistical techniques 
(Dongare et al., 2022; Gangopadhyay and 
Reddy, 2022; Khan et al., 2021) include 
spatial autocorrelation, variogram analysis, 
kriging (including ordinary kriging and 
kriging), co-kriging, and 
geostatistical simulation. These methods play 
crucial roles in understanding spatial patterns, 
predicting unknown values, and managing 


universal 


spatially distributed resources effectively. 
Overall, geostatistical methods provide 
powerful tools for handling spatial data, 
enabling data-driven decision-making, and 
facilitating informed actions in diverse fields 


where spatial analysis and prediction are vital. 


This comprises kriging, co-kriging, areal 
interpolation and empirical bayesian kriging. 
4.1 Kriging 

Kriging is a powerful geostatistical (Reza 
et al., 2016; Rajalakshimi et al., 2023; 
Mondal et al., 2021) interpolation method 
that delivers the best linear unbiased estimate 
of a variable at unsampled locations (Singh 
and Sarma, 2020). It incorporates both spatial 
correlation and uncertainty in the data, 
making it a robust and reliable interpolation 
technique. The fundamental principle 
underlying Kriging (Khan et al., 2021; 
Kingsley et al., 2019; Openshaw and Clark, 
2019) is to minimize prediction error by 
assigning appropriate weights to neighboring 
data points based on their spatial distance and 
correlation. The Kriging method assumes that 
the spatial correlation in the data can be 
modeled variogram (or 
semivariogram). This variogram describes 


using a 


how the variance of the variable changes with 
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the distance between data points. By using 
the variogram model, Kriging can provide a 
continuous and spatially smooth surface, 
allowing for accurate estimation at unsampled 
locations. A notable advantage of Kriging is 
its ability to quantify the uncertainty in 
predictions. The method produces an 
estimation variance that indicates the level of 
uncertainty associated with the estimated 
values, providing valuable insights into the 
reliability of the predictions. Various variants 
of Kriging, including ordinary Kriging, 
simple Kriging, and universal Kriging, offer 
different levels of assumptions and 
complexity. Among these, ordinary Kriging is 
the most widely used approach for generating 
spatial variability maps of soil properties due 
to its superior performance compared to other 
approaches. In summary, Kriging is a highly 
effective geostatistical method that accounts 
for spatial correlation and _ uncertainty, 
enabling precise and reliable estimation of 
values at unsampled locations. Its ability to 
generate smooth surfaces and _ provide 
uncertainty measures makes it a popular 
choice for various applications in geology, 
environmental science, agriculture, and more. 
4.2 Co-Kriging 

Co-Kriging is an extension of the 
traditional Kriging method used for the 
simultaneous interpolation of two or more 
correlated variables. It is 
beneficial when there is a spatial correlation 
between multiple variables, and utilizing this 
correlation can enhance the accuracy of the 
estimates. Co-Kriging becomes valuable in 
situations where two or more variables 
exhibit spatial relationships, and it can 
provide more precise predictions compared to 


particularly 


using Kriging independently, especially when 
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data for one variable is sparse or missing. 
Both Kriging and Co-Kriging are powerful 
tools for spatial interpolation, enabling the 
estimation of values at unsampled locations 
while taking into account spatial correlation 
and uncertainty. 


The choice between Kriging (Singh and 
Sarma, 2020) and Co-Kriging depends on the 
characteristics of the data and the presence of 
multiple correlated variables. When multiple 
correlated variables are available, Co-Kriging 
can leverage the spatial relationship between 
them to improve the interpolation results. In 
summary, Co-Kriging is a_ valuable 
geostatistical technique that extends the 
capabilities of traditional Kriging by allowing 
for the joint estimation of multiple correlated 
variables. 

4.3 Areal interpolation 

Areal interpolation, also known as areal 
weighting or areal disaggregation, is a 
technique used to estimate and redistribute 
data from one set of areal units to another set 
of non-overlapping areal units. The purpose 
of areal interpolation is to harmonize spatial 
data (Dongare et al., 2022; Bangroo et al., 
2023; Reza et al., 2016) that are available at 
different | geographic resolutions or 
administrative boundaries. When using areal 
interpolation, data representing an entire 
geographic area (such as a country, region, or 
municipality) is transferred to a different set 
of geographic units, which may have different 
shapes and sizes. The method involves 
redistributing the data based on some 
proportional relationship between the areas of 
the source and target Areal 
interpolation methods can vary depending on 
the assumptions made about the spatial 
relationship between the source and target 


units. 
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units. Common approaches include the areal 
weighting method, which redistributes data 
based on the proportional overlap of source 
and target areas, dasymetric mapping, which 
considers additional ancillary data to refine 
the interpolation, and spatial interpolation 
techniques like Inverse Distance Weighting 
(IDW), which use the spatial proximity of 
data points for redistribution. Areal 
interpolation is essential for various 
applications (Zakeri and Mariethoz, 2021), 
such as harmonizing data from different 
sources, aggregating data to a common 
geographic scale, and generating consistent 
spatial datasets for analysis and modeling 
across different geographic units. It finds 
extensive use in fields such as geography, 
demography, environmental science, and 
regional planning, where harmonizing and 
integrating spatial data from diverse sources 
is crucial for accurate analysis and decision- 
making. 
4.4 Empirical Bayesian kriging (EBK) 
Empirical Bayesian kriging (EBK) is an 
advanced geostatistical interpolation method 
that combines the principles of kriging and 
Bayesian statistics to estimate values at 
unsampled locations. This technique is an 
extension of traditional kriging and offers 
several advantages by incorporating external 
information about the spatial variability of the 
data. In traditional kriging, the variogram 
model, which measures spatial correlation, is 
assumed to be known or directly estimated 
from the data. However, in empirical 
Bayesian kriging, the variogram model 
parameters are treated as random variables 
and estimated using additional data called the 
"drift" or "external drift" data. This approach 
provides flexibility 


greater in modeling 
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spatial correlation __ since 
parameters are estimated instead of assumed. 
Empirical Bayesian kriging delivers more 


variogram 


reliable uncertainty estimates by considering 
variogram uncertainty in the interpolation 
process. It can handle situations where the 
spatial variability of the data varies across 
different regions, making it adaptable to 
complex datasets. empirical 
Bayesian kriging requires the availability of 
external drift data, which may not always be 
readily obtainable, and it may be more 


However, 


computationally intensive compared to 
traditional kriging. Empirical Bayesian 
kriging finds common _ application in 


geostatistics (Lu et al., 2022), spatial data 
analysis, and environmental modeling 
(Mondal et al., 2021), particularly when 
auxiliary information is available to enhance 
interpolation 
estimates. This method is particularly useful 


accuracy and _ uncertainty 
for large datasets or situations where data are 
collected at different spatial scales, providing 
a powerful tool for spatial data analysis and 
prediction in various fields. 
5. Method of Interpolation with Barriers 
The Method of Interpolation with Barriers, 
also known as constrained interpolation, is a 
spatial interpolation technique that takes 
barriers or constraints into account during the 
interpolation process. Barriers refer to areas 
in a geographic space where data values are 
not continuous or where the underlying 
phenomenon being interpolated is interrupted 
or discontinuous. The primary objective of 
interpolation with barriers is to generate a 
smooth and surface while 
respecting the presence of barriers and 
avoiding interpolation across them. This is 


continuous 


particularly important in situations where the 
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data or phenomenon being interpolated 
should not be assumed to be continuous over 
certain regions. Interpolation with barriers 
becomes especially useful when certain 
geographic features act as_ physical 
boundaries, such as rivers (Lu et al., 2022), 
mountains, or land use boundaries (Nagaraj et 
al., 2023; Othmani et al., 2023). It is essential 
to consider these barriers when estimating 
values at unsampled locations to ensure 
accurate and realistic results. This method 
finds application in various fields, including 
environmental modeling, hydrology, urban 
planning, and natural resource management. 
By _ incorporating spatial _ constraints, 
interpolation with barriers helps create more 
reliable and accurate interpolation results, 
avoiding unrealistic interpolation across 
physical barriers and providing a_ better 
representation of the underlying spatial 
patterns and variations. 
5.1 Kernel smoothing 

Kernel smoothing, also known as kernel 
regression (AbdelRahman et al., 2020; Zakeri 
and Mariethoz, 2021) or kernel density 
estimation, is a non-parametric statistical 
technique widely used to estimate the 
underlying smooth pattern of a dataset. This 
method is commonly applied in data analysis 
and visualization to reduce noise, reveal 
underlying trends, and estimate probability 
density functions for continuous data. The 
fundamental concept behind _ kernel 
smoothing is to approximate the values of a 
function at a specific point by averaging the 
observed data points, weighted by their 
distance to that point. The weights are 
determined by a kernel function, which is a 
function that 
decreases as the distance from the point of 


symmetric, non-negative 
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interest increases. The kernel function acts as 
a smoothing window, and its choice impacts 
the smoothness of the resulting estimate. 
Kernel smoothing finds applications in 
various fields, including signal processing, 
geostatistics, image processing, and 
environmental science. It is particularly 
advantageous because it does not assume a 
specific parametric model, making it flexible 
and versatile for handling complex patterns 
and noisy data. selecting an 
appropriate kernel function and bandwidth 


However, 


parameter is crucial to obtain meaningful and 
accurate results in kernel smoothing. The 
bandwidth parameter controls the width of 
the smoothing window and influences the 
level of smoothing applied to the data. Cross- 
validation (Gékmen et al., 2023; Kingsley et 
al., 2019; Bangroo et al., 2023) techniques are 
often used to optimize the bandwidth 
selection for a specific dataset and application, 
ensuring that the resulting estimates are 
reliable and well-suited to the data 
characteristics. In kernel 
smoothing is a powerful non-parametric 
technique that allows for the estimation of 
smooth patterns in data without making 
strong assumptions about the underlying 
model. Its flexibility and versatility make it a 
valuable tool fields for data 
analysis, noise reduction, and _ probability 
density estimation. Properly selecting the 
kernel function and bandwidth parameter is 
essential to ensuring the accuracy and 
meaningfulness of the smoothed estimates. 


5.2 The diffusion kernel 


The diffusion kernel, also known as the 
heat kernel or Gaussian kernel, is a specific 


conclusion, 


in various 


type of kernel function employed in various 
mathematical and computational methods, 
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such as machine learning, graph theory, and 
image processing. Its name originates from its 
connection to the heat equation in physics, 
where it represents the diffusion of heat over 
time. The diffusion kernel is constructed 
based on a similarity matrix derived from the 
dataset using techniques like the Gaussian 
similarity function. This similarity matrix 
measures the similarity or closeness between 
data points. The diffusion kernel finds 
application in diverse areas, including graph- 
based machine learning, dimensionality 
reduction, image processing, and spectral 
clustering. It serves as a potent and versatile 
tool for capturing the intrinsic structure and 
relationships in complex datasets (Zakeri and 
Mariethoz, 2021) and graphs. One of the key 
advantages of the diffusion kernel is its 
ability to handle both local and_ global 
information diffusion, making it valuable for 
tasks where understanding — relationships 
between data points at different scales is 
crucial. Overall, the diffusion kernel is a 
powerful and flexible approach widely used 
in data analysis and machine learning tasks 
due to its capacity to capture complex 
relationships and patterns within datasets and 
graphs. Its association with the heat equation 
adds to its significance in various fields, 
making it an essential tool in diverse 
applications. 
6. Limitations and challenges 

The spatial analysis also comes with 
certain limitations which preferably include: 
Limited data: For reliable predictions, 
geostatistical approaches require a significant 
number of soil samples. Data that is sparse or 
inadequately dispersed might lead to 
uncertainty and less reliable outcomes. 


Data accuracy and quality: The accuracy 
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and quality of soil samples might have an 
impact on the dependability of geostatistical 
analysis. It is critical to ensure data quality 
through adequate sampling practices and data 
validation. 


Outliers: Outlier data points can have a 
substantial impact on the findings of 
variogram modeling and __ interpolation. 
Outliers must be identified and handled 
correctly to ensure the robustness of the 
analysis. 


Assumptions of stationarity: Geostatistical 
techniques make the assumption that the 
spatial dependence of soil characteristics 
remains constant over the whole study region. 
However, the spatial correlation may vary 
across the region, putting doubt on the 
hypothesis of stationarity. 


Model selection: Selecting appropriate 
variogram models and __ interpolation 
techniques can be a difficult process. The 
selection of models and approaches has a 
considerable impact on the accuracy of 
predictions and interpretations. 


Extrapolation: Making predictions outside 
of the range of the collected data might be 


erroneous and result in inaccurate conclusions. 


Predictions should be used with caution to 
avoid overestimation. 


Prediction uncertainty: Geostatistical 
forecasts are subject to intrinsic uncertainty. 
Understanding and quantifying uncertainty is 
critical for making informed prediction-based 
decisions. 


Scale and resolution: The scale of the study 
affects geostatistical analysis. The resolution 
of spatial data might affect the conclusions, 
and findings at one scale may not hold true at 
another. 
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Anisotropy: Anisotropy occurs when the 
spatial structure of soil characteristics 
exhibits distinct patterns in various directions. 
Including anisotropy in variogram modeling 
can be difficult and may necessitate 
additional considerations. 


Spatial bias: Spatial data may suffer from 
sampling bias, in which specific portions of 
the dataset are overfit or underfit. The 
representativeness of the forecasts can be 
influenced by spatial bias. 


7. Study outlook 

Several possible breakthroughs and 
developments (Parker, 2023) in the field of 
spatial modeling for environmental factors 
are likely to alter the way of comprehension 
and analysis of spatial data. Future 
developments will most likely focus on 
incorporating temporal aspects into spatial 
models, allowing researchers to investigate 
how environmental variables change over 
time. Remote sensing _ technological 
advancements are enabling the collection of 
high-resolution and multi-spectral data. This 
allows for more accurate and_ realistic 
depictions of environmental parameters such 


as land cover, temperature, and vegetation. 


Machine learning and deep 
algorithms can extract patterns from this data 


learning 


and use them to create more accurate spatial 
models. Combining various types of data and 
modeling approaches, such as_ simulation 
models and observational data, will result in a 
more comprehensive understanding of 
environmental processes. Hybrid models will 
require improved statistical methodologies to 
successfully blend varied data 
Machine learning algorithms (Mousavi et al., 
2023) used with spatial data will provide 
more accurate and interpretable models. 


sources. 
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Spatial convolutional neural networks (CNNs) 
and graph-based machine learning techniques 
will be utilized to utilize complex spatial 
patterns in environmental data. 


8. Future perception 

Geostatistical methods are more 
environmentally friendly than traditional 
approaches as they optimize 
utilization with minimum waste generation, 
reduce the environmental effect, and deliver 


resource 


more accurate solutions. Geostatistical 
approaches can uncover geographical patterns 
(Parker, 2023) and trends in environmental 
data, allowing for early detection of 
environmental changes or abnormalities. This 
enables immediate mitigation — efforts, 
reducing potential harm to ecosystems and 
minimizing long-term consequences. It can 
also help discover appropriate locations (site 
suitability studies) for renewable energy 
installations such as solar panels and wind 
turbines, 
while minimizing environmental impact. It 
also helps to improve climate modeling 
accuracy by including spatial variability 
(Wang, 2023). This enables more accurate 


maximizing energy production 


predictions of climate-related variables, 
which aids in assessing the implications of 
climate change and guiding adaptation efforts. 
Generally, traditional approaches may yield 
accurate without the added 
complexity of spatial modeling in 
circumstances where data is abundant and 
spatial patterns are not prominent. Spatial 


answers 


models, on the other hand, are likely to 
exceed traditional methods in terms of 
accuracy when dealing with environmental 
data that exhibit spatial autocorrelation, local 
variability, or irregular patterns. However, 
spatial models are used in environmental 
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monitoring and remediation to monitor air 
and water quality, predict pollutant dispersion, 
and locate contaminated areas. They direct 
effective site remediation procedures that 
reduce exposure hazards and environmental 
damage. Studies have reported that the 
adoption of spatial models has led to 
improvements in real-world scenarios. Spatial 
models have been used in climate change 
studies, seasonal changes mapping (Zhao et 
al., 2018), epidemiology (Kieu et al., 2021) 
and disease mapping (Abd El-Ghany et al., 
2020), high-risk pollution zones (Jumaah et 
al., 2023), deforestation patterns (Coetzee, 
2022), identifying regions at risk of soil 
erosion, and guiding conservation activities 
(Ghute et al., 2023), soil nutrient levels, urban 
planning infrastructure development and 
resource utilization (Singh and Sarma, 2023). 
They are well used for agricultural needs 
whether related to fertilizer or irrigation by 
minimizing waste and production upsurge. 
Spatial models have also estimated the scope 
of natural disasters (Khan et al., 2023) such 
as floods, landslides, and tsunamis. These 
forecasts help early warning systems and 
emergency response preparation. They are 
also used in energy sector planning, 
healthcare planning, and various conservation 
methods. 


9. Conclusions 

Geostatistics has emerged as an indispensable 
and versatile tool for understanding spatial 
relationships and making accurate predictions 
in various scientific fields. In the context of 
soil parameter analysis, geostatistics provides 
a powerful means to comprehend the spatial 
variability of soil properties. 
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Figure 5: Basic Applications of Geostatistical Analysis 


By identifying spatial patterns and 
quantifying uncertainties, geostatistics 
facilitates informed decision-making for 
sustainable land management and 
environmental applications. The key to 
hamessing the full potential of geostatistics 
lies in careful data collection, preprocessing, 
and validation. Ensuring the accuracy and 
reliability of results requires diligent attention 
to data quality and representative sampling 
techniques. By capitalizing on the spatial 
natural 

offer 
effective 


autocorrelation inherent in 

phenomena, geostatistical methods 
valuable insights that enable 
management of natural 

environmental protection, and informed 
decision-making across a broad spectrum of 
applications. Nevertheless, it is imperative to 
approach geostatistical analyses with caution, 
select appropriate models, and account for 


resources, 
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uncertainties to ensure robust and trustworthy 
results. As technology continues to advance 
and data collection techniques improve, 
geostatistics will continue to play a vital role 
in unraveling the complexities of soil systems 
and promoting sustainable land management 
practices. In conclusion, geostatistics has 
revolutionized spatial data analysis, and its 
application in soil parameter analysis holds 
immense promise for enhancing our 
understanding of soil variability and 
supporting evidence-based decision-making 
for a sustainable and resilient future. 
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